Quick demo of how to read data from the web using R

Special focus: how to get around your corporate firewall

The TFL APIs

You can access the Transport for London Unified API page here. In order to be able to request data, you will need to register, and get an app id and key. You can do so here. Once you have that, you can have a look through the unified API site for all the different data streams.

There are also a series of blog posts that tell you more about the API here.

All you have to do is make a call using a URL. For this you need to know the structure for the request, which you can see on the unified API website, your search parameters, and your app id and key.

So for example to build a call, to get accident data for the year 2015, this is how you would structure your call:

year <- "2015"
myAppId <- "0f1de667"
myAppKey <- "6c47ac149cfec9ccb6bf32b953a85dd3"

requestUrl <- paste0("https://api.tfl.gov.uk/AccidentStats/", year, "?app_id=", myAppId, "&app_key=", myAppKey)

And the resulting requestUrl is what you need to return your accidents data for 2015. This returns data in a JSON format. If you wanted to have a look at what that looks like raw, then just copypasta the url you just created into a web browser. It will just look super messy.

Reading data from APIs

There are loads of different ways to do this, but here I just use the package JSONLITE. If you don’t have it already you will need to install and load the package:

install.packages("jsonlite")
library(jsonlite)

This package is essentailly “a fast JSON parser and generator optimized for statistical data and the web”. And in ideal conditions it is very easy to use it to parse in JSON data returned by the TFL API call. In ideal conditions it should take literally one line to parse the data:

d = fromJSON(requestUrl)

However if you try this on a corporate computer you might receive a connection error, because firewall.

Fortunately, there is a workaround. If you use readLines first to read in the returned text from the URL, and then within R parse that line, then it works alright.

l = readLines(requestUrl, encoding="UTF-8", warn=FALSE)
d = fromJSON(l)

And that should work!

Your data should look something like this:

X.type id lat lon location date severity borough casualties vehicles
Tfl.Api.Presentation.Entities.AccidentStats.AccidentDetail, Tfl.Api.Presentation.Entities 242783 51.518929 -0.101061 Long Lane junction with west Smithfield 2015-10-15T07:31:00Z Slight City of London list($type = “Tfl.Api.Presentation.Entities.AccidentStats.Casualty, Tfl.Api.Presentation.Entities”, age = 33, class = “Driver”, severity = “Slight”, mode = “PedalCycle”, ageBand = “Adult”) list($type = c(“Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities”, “Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities”), type = c(“PedalCycle”, “MediumGoodsVehicle”))
Tfl.Api.Presentation.Entities.AccidentStats.AccidentDetail, Tfl.Api.Presentation.Entities 242784 51.511156 -0.087549 122 Cannon Street junction with King William Street 2015-10-16T20:15:00Z Slight City of London list($type = “Tfl.Api.Presentation.Entities.AccidentStats.Casualty, Tfl.Api.Presentation.Entities”, age = 57, class = “Pedestrian”, severity = “Slight”, mode = “Pedestrian”, ageBand = “Adult”) list($type = “Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities”, type = “Car”)
Tfl.Api.Presentation.Entities.AccidentStats.AccidentDetail, Tfl.Api.Presentation.Entities 242785 51.516805 -0.081043 Bishopsgate junction with Liverpool Street 2015-10-20T15:26:00Z Serious City of London list($type = “Tfl.Api.Presentation.Entities.AccidentStats.Casualty, Tfl.Api.Presentation.Entities”, age = 34, class = “Driver”, severity = “Serious”, mode = “PoweredTwoWheeler”, ageBand = “Adult”) list($type = c(“Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities”, “Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities”), type = c(“Motorcycle_50_125cc”, “Motorcycle_500cc_Plus”))

You can now do something with this data. Like make a map: